FDR-Corrected Sparse Canonical Correlation Analysis with Applications to Imaging Genomics
نویسندگان
چکیده
Reducing the number of false positive discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimaging and genomics, where datasets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics dataset from the Philadelphia Neurodevelopmental Cohort. Our results link the brain activity during a working memory task, as measured by functional magnetic resonance imaging (fMRI), to the corresponding subjects’ genomic data. Our findings are supported by previous work on cognitive ability, neurodevelopmental, and other mental disorders.
منابع مشابه
Correlating Cellular Features with Gene Expression using CCA
To understand the biology of cancer, joint analysis of multiple data modalities, including imaging and genomics, is crucial. We propose the use of canonical correlation analysis (CCA) and a sparse variant as a preliminary discovery tool for identifying connections across modalities, specifically between gene expression and features describing cell and nucleus shape, texture, and stain intensity...
متن کاملSparse CCA: Adaptive Estimation and Computational Barriers
Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of variables. It has applications in many fields including genomics and imaging, to extract meaningful features as well as to use the features for subsequent analysis. This paper considers adaptive and computationally tractable estimation of leading sparse can...
متن کاملFDR made easy in differential feature discovery and correlation analyses
SUMMARY Rapid progress in technology, particularly in high-throughput biology, allows the analysis of thousands of genes or proteins simultaneously, where the multiple comparison problems occurs. Global false discovery rate (gFDR) analysis statistically controls this error, computing the ratio of the number of false positives over the total number of rejections. Local FDR (lFDR) method can asso...
متن کاملDementia induces correlated reductions in white matter integrity and cortical thickness: A multivariate neuroimaging study with sparse canonical correlation analysis
We use a new, unsupervised multivariate imaging and analysis strategy to identify related patterns of reduced white matter integrity, measured with the fractional anisotropy (FA) derived from diffusion tensor imaging (DTI), and decreases in cortical thickness, measured by high resolution T1-weighted imaging, in Alzheimer's disease (AD) and frontotemporal dementia (FTD). This process is based on...
متن کاملMinimax Estimation in Sparse Canonical Correlation Analysis
Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. This paper considers the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for...
متن کامل